& 

JCIIftK'dPCT/PTQ 1 5 MAR 200? 



SIGNAL COUNTING FOR IN SITU HYBRIDIZATION 

FIELD OF THE INVENTION 

This invention relates to methods of counting probe signals in biological 
specimens, such as probe signals produced by in situ hybridization in cells or tissue 
sections. 



BACKGROUND OF THE INVENTION 

Recent advances in molecular medicine have provided a greater opportunity 
to understand the genetic basis of disease, as well as the cellular mechanisms of 
disease, and select appropriate treatments with the greatest likelihood of success. 
Such diagnostic and prognostic cellular changes include the presence of tumor 
specific cell surface antigens (as in melanoma), and genetic abnormalities (such as 
activated oncogenes in tumors). A variety of techniques have evolved to detect the 
presence of these cellular abnormalities, including immunophenotyping with 
monoclonal antibodies, in situ hybridization with probes, and DNA amplification 
using the polymerase chain reaction (PCR). 

One such technique for molecular diagnosis is in situ hybridization (ISH) in 
which labeled hybridizing agents (such as DNA, RNA, or single stranded or double 
stranded DNA probes) are exposed to intact tissue sections. The probes can be 
labeled by direct or indirect means. In direct labeling, the label (a chromophore) is 
integral to the probe. Direct labels include fluorescent dyes such as derivatives of 
rhodamine, fluorescein, and Texas Red, or enzymatic reporters such as horseradish 
peroxidase or alkaline phosphatase. Indirect labeling involves attaching a hapten 
(such as biotin, mercury or dioxygenin) to a probe through a linker. After 
hybridization, the hapten is detected using a labeled antibody or other specific 
binding protein. 

When fluorescent dyes are used as labels, the technique is referred to as 
fluorescent in situ hybridization (FISH). Dyes such as fluorescein isothiocyanate 
(FITC) are often used to label a sequence of probe DNA in FISH. The probe 
hybridizes to a defined target nucleotide sequence of DNA in the cell, and the FITC 
fluoresces green when excited by a mercury arc lamp or Argon laser (in the case of 



1 



:3 



10 



15 



20 



25 



30 



laser scanning microscopy), so that the labeled probe can be visually detected when 
the probed tissue is viewed through a microscope. Each chromosome containing the 
target DNA sequence will produce a fluorescent signal (sometimes called a spot) in 
every cell when the specimen is illuminated with suitable excitation. For example, 
specimens hybridized with a probe hybridizing to a specific region on chromosome 
21 will produce two fluorescent signals in cells from normal individuals and three 
signals from Down's Syndrome (Trisomy 21) subjects who have an extra 
chromosome number 2 1 . Alternatively, a fluorescent probe that hybridizes to an X 
chromosome will give one fluorescent signal per cell in males (who have only one X 
chromosome) and two fluorescent signals in females (who have two X 
chromosomes). 

FISH is an excellent method for detection of gene copy number alterations in 
cancer and other diseases. FISH is a standard tool for analyzing congenital genetic 
alterations in clinical diagnostics (such as numerical chromosomal alterations, 
duplications, inversions and microdeletions). In cancer, characteristic gene 
amplifications or deletions are associated with the development and progression of 
the tumor. FISH analysis of gene amplifications in cancer can have prognostic and 
therapeutic significance, such as the detection of the HER-2 oncogene amplification 
in breast cancer. Detection of this oncogene by FISH was recently approved by the 
Food and Drug Administration as a diagnostic tool for human breast cancer. 

However, a limitation on the more widespread use of FISH technology has 
been that counting of the fluorescent spots is extremely tedious, inaccurate, often 
highly subjective, and subject to substantial intra-observer variability. It also 
requires a highly trained technician who can recognize the cells or tissue to be 
analyzed and recognize and count tiny fluorescent spots accurately. Finally, at most 
100 or 200 cells are typically analyzed per specimen, and in the case of gene 
amplification, much less than that (such as 20 per specimen). This can lead to 
statistical inaccuracy in defining the correct copy number. 

A significant impediment to the accurate counting of fluorescent signals is 
that the probes hybridize throughout a three-dimensional nucleus, and the probe 
signals have to be counted from different focal planes for each nucleus. However, 
cells and probe signals can overlap in the two-dimensional view, and overlapping 



2 




signals are seen as a single signal. Such overlapping results in undercounting of 
signals, which can make it appear that an amplified gene is less amplified, or that 
fewer copies of a normal gene are present. 

Given the tedium and subjectivity of signal counting, efforts have been made 
5 to automate this technique. For example, U.S. Patent No. 5,523,207 discloses a two 
dye FISH method in which probes are labeled with a first dye (such as FITC) and a 
contour of the nucleus is labeled with a second dye (such as propidium iodide or PI). 
These two dyes allow the number of signals per visualized nucleus to be determined. 
However, automated FISH spot counting using such techniques has been limited 

10 because FISH signals in the nuclei are often at different focal planes, resulting in 

interfering out-of-focus light. Moreover, automated detection of nuclear boundaries 
has been very difficult to perform in tissue sections. These factors have contributed 
to the inadequacy of existing algorithms for performing automated FISH. 

It would therefore be helpful to provide a method and device for improving 

15 the accuracy of FISH spot counting. 



SUMMARY OF THE DISCLOSURE 

In one embodiment disclosed herein, fluorescently tagged nucleic acid probe 
signals are counted in a region of interest in a biological specimen by determining a 
20 ratio of signals from a test probe to signals of a reference probe, and the region of 
interest includes multiple cells. This is a contrast to prior approaches, in which 
probe signals have been counted with reference to cells or nuclei, and in which 
automated methods have counted probe signals with respect to stained nuclear 
contours. 

25 In certain embodiments, the reference probe may be a fluorescently labeled 

polynucleotide (such as DNA or RNA) that hybridizes to the region of interest in a 
gene, and the reference probe is a polynucleotide labeled with a different fluorescent 
color, and which hybridizes to a reference target. The test probe may hybridize to a 
gene that is implicated or suspected to be involved in a particular disease, such as 

30 tumor development and progression. The reference probe may, for example, 

recognize a centromere of the same chromosome on which the gene of interest is 
contained. An increased ratio of test probe signals to reference probe signals would 



3 



(3 



P.. r i o e !LLj. » «•„ lui , o !ti j in, , 

0 



10 



15 



20 



25 



30 



then indicate an amplification in gene copy number, while a decrease in that ratio 
would indicate a relative loss of the gene of interest (such as a gene deletion). By 
determining a ratio between the quantities of test and reference signals, the problem 
of measuring changes in gene copy number with respect to a cell or nucleus is 
avoided. Typically, a count of centromeres approximates a nucleus or cell count. 

Particularly accurate counting of FISH signals can also be accomplished by 
obtaining successive contiguous images of the region of interest to distinguish 
overlapping signals from the biological specimen. Without distinguishing the 
overlapping signals, they would otherwise obscure one another, and diminish 
accuracy of the spot count. In particular embodiments, the successive images are 
slices, such as digital microscopic optical sections from different depths of the 
biological specimen, obtained by confocal microscopy. The successive images are 
transformed by detecting and representing the positional values in each image of 
fluorescent emission signal segments, which make up the probe signal, as an array of 
digital values. Signal segments below a certain value (e.g. a threshold number of 
pixels) may be eliminated. The remaining digital signal segments may then be 
analyzed to combine contiguous fluorescent signal segments in successive optical 
sections into a single spot signal, which may be assigned to a particular optical 
section in which a strongest fluorescent signal segment is located, or a group of 
optical sections across which the contiguous signal segments have been detected. 
This localization of the fluorescent signal to a particular optical section allows 
overlapping spots to be distinguished, both in the axial and transverse dimensions of 
the three-dimensional representation. 

The system detects the location of fluorescent spot signals in three- 
dimensional space by performing a morphological top-hat transform to digital 
images of the different levels to obtain fluorescent intensity spfces that indicate a 
spot signal segment. A threshold level of fluorescence intensity is determined to 
eliminate signal segments that are below a fluorescence intensity that would be 
associated with a valid spot signal. Remaining contiguous spot signal segments are 
segmented into a single spot signal, and the single spot signal may be assigned a 
location at a level associated with a greatest fluorescent intensity signal segment. 



4 




Certain disclosed embodiments also include devices for counting signals 
from in situ hybridization of probes in biological tissue, or determining a three- 
dimensional relationship between the signals, in which the device includes a 
confocal microscope, a digital camera positioned to obtain digital optical section 
5 images of different levels of the biological tissue, and a computer implemented 

system that detects and combines contiguous or adjacent signal segments (which are 
above a threshold) at the different levels, and separates vertically overlying, 
transversely overlapping, or non-contiguous signals from one another. After 
separating such signals from different levels into different spot signals, the computer 
10 implemented system then counts the spot signals, or compares their relative 

locations in three-dimensional space. Two or more different signal types (such as 
two or more distinguishable fluorescent dyes) may be used, one color for the test 
probe signal and a second color for the reference probe signal. The ratio of the 
number of test probe signals to reference probe signals can then be determined, or an 
1 5 unexpected overlap of the signals (as in a genetic translocation) can be assessed. 

The test probe signals and the reference probe signals may be obtained 
^.separately, for example by successively illuminating the tissue specimen with light 
of different colors that selectively causes the different dyes to fluoresce, by viewing 
the specimen through filters that filter out signals (such as a filter that removes 
20 colors except the color of interest), or exposing separate contiguous tissue sections 
to the different probes. Alternatively, the test and reference probe signals can be 
obtained simultaneously, using multiple band-pass excitation and/or emission filters. 
Once the test and reference probe spot signals have been counted, a ratio of test 
probe signals to reference probe signals is calculated (without reference to 
25 boundaries of a cell or nucleus) to determine whether there is a genetic alteration, 
such as an alteration in gene copy number. 

Alternatively, in certain disclosed embodiments, the FISH spots are 
accurately counted by obtaining the plurality of digital optical images at different 
levels of a biological sample, and constructing a three-dimensional image showing 
30 discrete fluorescent signal segments at different levels of the three-dimensional 

image. The three-dimensional image is constructed by determining a location of a 
fluorescent signal segment of a particular color in the different levels of the 



5 



i * \ 



10 



15 



20 



25 



30 



biological sample, combining contiguous signal segments (which are above a pre- 
selected threshold) into a single spot signal, and separating non-contiguous signal 
segments into different spots signals. The location of signal segments in each level 
is determined by the presence of a fluorescent brightness intensity spike that 
indicates an increase in image component intensity as compared to a background 
intensity. The locations of signals of different colors can be similarly resolved in 
three-dimensional space, the number of spot signals of each color counted, and a 
ratio of the spot signals determined. This three-dimensional imaging would also 
allow one to study the orientation or location of the spots in the nuclei to make 
conclusions about the presence of genetic rearrangements that do not change copy 
number of the signals, such as translocations and inversions. 

Certain features can be implemented to increase the accuracy of the spot 
counting. For example, under certain circumstances, a group of spots may be 
packed so closely together as to form a cluster that is difficult to analyze using a 
standard algorithm. Clusters can be automatically identified and counted using a 
cluster calibration feature, or a user can specify that a particular region of an image 
is a cluster. 

Another feature for increasing accuracy filters out false spots by, for 
instance, eliminating spots appearing at a same location in the test and reference 
probe images or filtering out image data indicating autofluorescence. In addition, a 
user interface can be presented to allow a user to either select areas of particular 
interest for separate processing or specify areas not to be processed. In this way, the 
invention can benefit from information provided by a human operator. 

In the copy number analysis, the ratio of signals can be a ratio of spot signals 
from a test probe that recognizes a gene of interest, and a reference probe that 
recognizes a chromosomal locus having an expected quantity in the biological 
specimen. The method can further include determining whether there is an increase 
in an expected ratio between the test signals and the reference signals, indicating an 
amplification of the gene of interest, or whether there is a decrease in the expected 
ratio between the test signals and the reference signals, indicating relative loss of the 
gene of interest. 



6 



This method is particularly applicable to high throughput techniques for 
performing automated FISH analysis of a large number of tissue, cell, or other 
specimens. The specimens can, for example, be in a tissue or cell microarray that 
includes specimens from the same or different sources, such as tumors. In such 
embodiments, the method includes providing an array of biological samples, 
hybridizing the biological samples with a fluorescent test probe that hybridizes to a 
gene of interest in the biological samples and with a fluorescent reference probe that 
hybridizes to a chromosomal reference locus in the biological samples. Images are 
then obtained by confocal microscopy of contiguous sections at different depths of a 
plurality of the biological samples in the array, and fluorescent signal segments from 
the contiguous sections are detected. The contiguous signal segments in different 
sections are combined into a corresponding single spot signal, and the separate spot 
signals are resolved from one another. The distinct spot signals can then be counted. 

In particular embodiments, the tissue array can include an array of many 
different tissue specimens, for example at least 50 tissue specimens, but hundreds or 
even thousands of tissue specimens can be included in the microarray. The tissue 
arrays can be constructed by obtaining a plurality of donor specimens, placing each 
donor specimen in an assigned location in a recipient array, and obtaining a plurality 
of copies of the recipient array in a manner that each copy contains a plurality of 
donor specimens that maintain their assigned locations. For example, a set of tissue 
samples relating to a same type of tissue from a plurality of donor specimens can be 
included. However, tissue arrays made by any method are suitable for use with the 
method of counting FISH spot signals. 

The foregoing and other objects, features, and advantages of the invention 
will become more apparent from the following detailed description of disclosed 
embodiments which proceeds with respect to the accompanying drawings. 

BRIEF DESCRIPTION OF THE FIGURES 

FIGS 1 A and IB are schematic side and top views illustrating the problem of 
viewing three-dimensional FISH images in two dimensions. 



FIG. 2A is a schematic view illustrating the prior approach of counting two 
color FISH spot signals with respect to cell nuclei (the contours of the nuclei being 
shown around the spot signals). 

FIG. 2B is a view similar to FIG. 2A, but shows counting two color FISH 
spot signals and determining a ratio of the different colored signals in a region of 
interest (such as a field of view FOV) instead of with respect to cell nuclei. FIG. 2C 
is a photomicrograph of prostate tissue, illustrating a region of interest (ROI) within 
the tissue for purposes of calculating a ratio of test to reference probe signals. 

FIG. 3 is a schematic view illustrating two color FISH, in which signal 
locations have been determined in three-dimensional space. 

FIGS. 4A and 4B are flow diagrams illustrating one embodiment of an 
automated system for counting FISH spot signals in accordance with the present 
invention. 

FIG. 5A is a flow diagram illustrating an algorithm for counting the FISH 

signals. 

FIG. 5B is a series of optical sections 1-5, and a max image M, of prostate 
tissue subjected to two color FISH with a probe for the androgen receptor (labeled 
red) and a probe for the normal X chromosome (labeled green), in which red signals 
appear weaker than green signals. 

FIG. 6A is a composite figure showing a print-out of optical sections of a 
series of eight confocal images (cuts 1-8), and a view of the image that would be 
seen (max-image) when the series of eight overlapping images would be viewed 
from above as a two-dimensional image. 

FIG. 6B is a series of optical sections 1-5, and a max image M, of breast 
cancer tissue subjected to two color FISH with a ribosomal S6K probe (labeled red) 
and a probe for the chromosome 1 7 centromere (labeled green), in which red signals 
appear weaker than green signals. The excess of red S6 kinase signals over green 
reference signals indicates amplification of kinase. 

FIG. 7 is a MATLAB graphical user interface (GUI) for displaying optical 
sections of a three-dimensional tissue specimen that has been subjected to FISH. 



8 



FIG. 8 is a histogram of the number of occurrences of the various gray 
levels, or fluorescence intensities, in the max image, which is used to threshold valid 
pixels and eliminate non-informative image components. 

FIG. 9A is a print-out of a MATLAB GUI illustrating processing spot signal 
segments from different levels, and combining contiguous signal segments above a 
threshold into a single spot signal (which can be assigned to a particular level or 
levels) for purposes of counting a spot signal. 

FIG. 9B is a schematic view of a vertical side view of optical sections, 
illustrating how image components below a certain threshold are eliminated to 
separate otherwise vertically contiguous signal segments, and separate overlying 
signals from one another. 

FIG. 9C is a view similar to FIG. 9B, showing how transversely and 
vertically overlapping signal segments are separated by eliminating signal segments 
below a threshold, and grouping the remaining signal segments into discrete spot 
signals. 

FIG. 10 is a three-dimensional representation of signal segments in different 
optical sections, resolved into a spot in an assigned optical section, and a combined 
image of the signals projected as spheres into a max image, in which a volume of the 
spheres is proportional to an intensity of the signal. 

FIG. 1 1 is a view of a tissue section mounted on a slide, for processing by 

FISH. 

FIG. 12 is a view of a tissue microarray mounted on a slide, for processing 
by FISH. 

FIG. 13 is a flowchart showing an overview of an algorithm used to count 
FISH spots in a set of digital image slices. 

FIG. 14 is a data flow diagram of an algorithm used to count FISH spots in a 
set of digital image slices. 

FIG. 15A is an image showing FISH spots relating to the X centromere in 
normal prostate tissue, including false spots caused by autofluorescence. 

FIG. 15B is a graph showing intensities of spot candidates for purposes of 
removing spots exceeding a threshold intensity. 




FIG. 1 5C is a graph showing intensities combined with area for spot 
candidates for purposes of removing spots exceeding a threshold. 

FIG. 15D is the image of 15A modified to emphasize spot candidates 
identified as autofluorescent tissue particles. 
5 FIG. 16A is a scatter plot of spot candidates for an image relating to a FISH 

experiment conducted for centromere 17. 

FIG. 16B is a scatter plot of spot candidates for an image relating to a FISH 
experiment conducted for the gene HER-2 and identifies spot clusters on the plot. 

FIG. 17 is an illustration of a user interface for selecting areas of interest to 
10 be considered when calculating a spot count for an image. 

FIG. 18 is an illustration of a user interface depicting a three-dimensional 
representation of a spot candidate. 



DETAILED DESCRIPTION OF SEVERAL EXAMPLES 

15 The present invention includes a method and apparatus for counting FISH 

spots, and particularly an automated, computer implemented method and device that 
_ helps avoid counting errors introduced by vertically or horizontally overlapping 
spots in two-dimensional projections of three-dimensional tissue sections. One of 
the problems solved by this invention is illustrated in FIG. 1 A, which shows a 

20 vertical section through a stack of contiguous horizontal confocal layers of a tissue 
section that has been subjected to FISH. Hybridization probe signals are shown as 
spots in this figure, in which it can be seen in a vertical section that the spots are 
discrete and spaced along the z-axis. However, when viewed from above (in an x-y 
plane) as shown at the bottom of FIG. 1 A, the discrete spots overlap and can not 

25 clearly be counted as separate spots. FIG. IB illustrates a similar problem in which 
even more probe spots appear throughout the z-depth of the tissue section, but 
appear as an indistinguishable blur when viewed in an x-y plane from above, as 
shown at the bottom of FIG. IB. 

Another problem with the prior art is illustrated in FIG. 2A, which shows the 

30 conventional technique of counting hybridization signals within each cell nucleus. 
This view illustrates two color FISH, in which two probes are labeled with different 
dyes that fluoresce with different colors. The red label (R) may, for example, be 



10 




attached to a probe that hybridizes to a gene of interest (such as a hormone receptor 
gene that may be amplified in certain tumors). The green label (G), in contrast, may 
be attached to a probe that hybridizes to a known chromosomal locus that is not 
expected to vary in disease states (such as the centromere of a chromosome on 
5 which the gene of interest is found). The red label (R) is illustrated by a gray color 
in the schematic figure, while the green label (G) is represented by a darker color. 

In a particular example, the gene of interest could be recognized by a probe 
for a gene on the X chromosome labeled with spectrum orange (to provide an 
orange-red spot signal), and the reference probe could be labeled with spectrum 

10 green (to provide a green spot signal) for the centromere of the X-chromosome. A 
single green signal (G) would therefore be observed in the nuclei of the schematic 
representation of FIG. 2 A from male cells (which have only one X chromosome), 
while two green signals (G) would be seen in female cells. Amplification of the 
gene of interest would be noted in certain cells in the schematic representation of 

15 FIG. 2 A in which there is an increase in the ratio of red signals (R) to green signals 
(G). 

As shown in FIG. 2 A, it is conventional to count the number of signals in 
each nucleus of a large number of cells. This approach has been adopted because 
amplification or deletion of a gene occurs in large populations of cells, and 

20 significant changes in copy numbers of genes are often only detected by examining a 
large number of cells (for example, at least 200). Because the amplification has 
been considered to be a nuclear event, a change in the copy number of a gene with 
respect to each nucleus has been counted, both manually and in automated systems. 
Such systems have been difficult to automate, however, because cells and nuclei 

25 overlap (as shown by the overlapping nuclear contours in FIG. 2A), and the nuclear 
contours have been difficult to reliably recognize in automated systems. 

The present invention adopts a different approach which has been found to 
be more accurate, and has the additional advantage of being more accurately 
automated. This approach is shown in FIG. 2B, in which the ratios of probes are 

30 determined without reference to the cells (or the nucleus) in which the probes are 
contained. The nuclear contours highlighted in FIG. 2A are absent in FIG. 2B to 
illustrate this difference. Hence, FIG. 2B shows the FISH spots of FIG. 2 A in a 



11 




region of interest (such as the microscope field of view [FOV] shown in FIG. 2A), 
but without reference to the nuclear contours illustrated in FIG. 2A. In accordance 
with the present invention, the ratio of test probes (R) to reference probes (G) in the 
region of interest is the ratio that is calculated. It has been found that this ratio 
provides sufficient information (over a sufficiently large number of cells in a region 
of interest) to be informative about the relative amplification or deletion of a gene of 
interest. 

A region of interest is any arbitrary informative region across which an 
informative ratio can be determined. In some instances, a region of interest is a 
microscopic field of view at low magnifications (e.g. 100-200X). An entire 
microscope field of view can then be used for the image capturing and analysis at 
400-1000X magnification (X40-100 objectives) for FISH analysis. The thickness of 
the tissue sections used for FISH analysis is the same as in sections routinely used 
for histopathological analyses, ranging from 4-10 n in thickness. 

Another example of a region of interest is an area which is selected for a 
specific analysis, as shown in FIG. 2C, which is a cross-sectional photomicrograph 
of tissue from a human prostate, as it would appear in a tissue microarray. In this 
figure, the region of interest (ROI) is the area of malignant cells that is outlined in 
black. A region of interest in this example is any homogenous site of a specimen, 
where most, if not all, specimens carry a particular alteration. However, 
contamination with non-altered normal tissue in the region of interest can be 
tolerated, if the copy number alteration in the abnormal cells is substantial. 

In some embodiments, to further improve the ability of the system to 
accurately determine the ratio of test probe signals to reference probe signals, a view 
of the probe signals in three-dimensional space is constructed, as shown in FIG. 3. 
This view shows the three-dimensional relationship of the test and reference probe 
signals, which are interspersed among one another in all three dimensions of the 
tissue section which has undergone FISH. This three-dimensional view of the 
section can be further processed by a computer implemented system, as described 
below, to automatically count signals of each color, and to obtain a ratio of the 
different colored signals. This approach is particularly usefiil for high-throughput 



12 



I O i li smip « , «a , o «i ii ft, n p 



10 



15 



analysis, for example, of tissue microarrays such as those disclosed in PCT 
publications WO US99/04000 and US99/04001. 

In yet other embodiments, the three-dimensional imaging can be used to 
detect many kinds of genetic rearrangements in cells other than deletions or 
amplifications. For example, differently colored probes for the bcr and abl genes 
could be used to detect a fusion of these genes which occurs in a genetic 
translocation associated with chronic myelogenous leukemia (CML). See Tkachuk 
et al., Science 250:559-562, 1990. Such differentially labeled probes will flank 
translocation breakpoints, and produce a fusion after the translocation has occurred, 
but will be at separate loci if the translocation has not occurred. However, unless 
proximity or distance between the differentially labeled spots can be determined in 
three-dimensions, a measurement of the fusion will be inaccurate. Hence the 
method of determining a three-dimensional relationship of probe spot signals within 
a region of interest (such as a nucleus or other region) will permit a large scale 
quantitative analysis of three-dimensional distances between any two probe signals. 
For example, the three-dimensional coordinates of a red signal are determined, and 
the three-dimensional coordinates of a green signal are determined, and the 
coordinates are then analyzed to determine if they are overlapping (suggesting a 
translocation that has moved them into contiguous genetic loci) or separate (on 
20 different genetic loci than they would be normally). 

EXAMPLE 1 
Device for Performing Automated FISH 

FIG. 4A shows such an automated spot counter 10 in accordance with the 
present invention. Briefly, the device 10 includes an automated optical microscope 
12 (such as a confocal microscope) having a motorized stage 14 for the movement 
of a slide 16 relative to the viewing region of the viewing portion 18 of the 
microscope, a camera 20 for obtaining electronic or digital images from the optical 
microscope, a processing system 22 for counting the spots, and a memory 24 and a 
high resolution color monitor 26 for the storage and display respectively of images 
processed by the device 10. 



25 



30 



13 



In a disclosed embodiment, the classification device 10 is automated and 
computer implemented, and therefore also includes, in addition to the motorized 
stage 14, an automated apparatus for focussing, for changing lens objectives 
between high and low power, and for adjustment of the light incident of the slide, as 
well as circuitry for controlling the movement of the motorized stage, typically in 
response to a command from the processing system. The microscope may also 
include an automated slide transport system for moving the slides containing the 
specimen to be classified on to and off of the motorized stage, and a bar code reader 
for reading encoded information from the slide. An example of a microscope 
performing at least some of these functions is manufactured by Carl Zeiss, Inc. of 
Germany, or Atto Instruments of Rockville, Maryland. In particular embodiments, 
the microscope is a confocal microscope from Atto Instruments, such as that shown 
in PCT WO 99/22261, a Laser Scanning Microscope LSM 510 from Carl Zeiss, 
Inc., or an Axioplan 2 microscope from Carl Zeiss, Inc., equipped with a CARV 
module available from Atto instruments. An example of a camera 20 suitable for 
use with the invention, is a Quantix CCD camera available from Photometries of 
Tuscon, Arizona. 

The signal counting device 1 0 is shown in FIG. 4B with particular emphasis 
on the classification elements embodied in the processing system 22. The 
processing system 22 may include an image processor and digitizer 42, and a 
general processor 46 with peripherals for printing, storage, etc. The general 
processor 46 can be an INTEL PENTIUM microprocessor or similar microprocessor 
based microcomputer, although it may be another computer-type device suitable for 
efficient execution of the functions described herein. The general processor 46 
controls the functioning and the flow of data between components of the device 10, 
may cause execution of additional primary feature signal counting algorithms, and 
handles the storage of image and classification information. The general processor 
46 additionally controls peripheral devices such as a printer 48, a storage device 24, 
such as an optical or magnetic hard disk, a tape drive, etc., as well as other devices 
including a bar code reader 50, a slide marker 52, autofocus circuitry, a robotic slide 
handler, the stage 14, and a mouse 53. Although a single processor system is 
shown, the invention could also be carried out in a variety of other systems. 



14 




10 



15 



20 



25 



30 



The depicted devices include computer-readable media such as a hard disk to 
provide storage of data, data structures, computer-executable instructions, and the 
like. Although the description of computer-readable media above refers to a hard 
disk, other types of media which are readable by a computer, such as removable 
magnetic disks, CDs, magnetic cassettes, flash memory cards, digital video disks, 
and the like, may also be used. 

The image processor and digitizer 42 (hereinafter referred to as the image 
processor) digitizes imagesfrom the digital camera 20 and performs a primary 
algorithmic classification on the images to filter out unwanted information. The 
image processor 42 and the general computer 46 may each access read-only and/or 
random access memory, as would be readily apparent to one skilled in the art, for the 
storage and execution of software necessary to perform the functions described 
relative to that processing component. Further, each component 42 and 46 includes 
circuitry, integrated circuit chips, etc. for the control of communication or data 
transfer over the data bus 54, as well as other functions typical of similar processors. 
The steps performed by the computer implemented system are illustrated in 
. .FIG. 5A. The tissue is first prepared and subjected to FISH (60) in the conventional 
manner, for example, as described in Example 2. The system then obtains multiple 
confocal optical sections (62), and the successive images are transformed by 
detecting and representing positional values in each image of fluorescent emission of 
a particular color as an array of digital values. Localized increases (spikes) in the 
digital values may represent signals in the optical section (64). A histogram is then 
generated (66) which produces a saddle point (described in association with FIG. 8) 
that permits fluorescent intensity image components below a threshold value to be 
eliminated (68, 70). Among the remaining image components, those that are 
substantially contiguous are grouped into segments according to various criteria 
(72). Additional filtering can be performed (74), for example, to account for 
autofluorescence and spot clusters. 

With the spot signals then resolved in three-dimensional space, they are 
counted. Additionally, the counted signals can be combined into a single max 
image, which is similar to the two-dimensional view seen through a conventional 



15 




microscope. The max image is a convenient view for rapidly reviewing the FISH 
results in two dimensions. 

In two-color FISH, signals from a second probe (usually of a different color) 
may also need to be detected (76). In that event, an optical filter can be changed to 
5 detect the new color, or an incident beam of laser light of a different color or 

wavelength can be directed at the tissue section. The signals of a different color are 
then resolved in three-dimensional space using steps 62-74 as previously described. 
Once no more signals of a different color are to be obtained, a ratio of the signals 
from the test probe to the reference probe is calculated to determine whether there is 
10 a gene copy alteration. 

EXAMPLE 2 

Example of Automated FISH Signal Counting From a Series of Confocal 
Images in High Throughput Analysis with Tissue M icroarrays 

Gene amplification is an important mechanism for the up-regulation of critical 
1 5 genes involved in cancer initiation and progression. A number of important 

oncogenes have already been found to be activated by DNA amplification. These 
include the HER-2 (17ql2), C-MYC (8q24), PRAD1/CYCLIN D (1 lql3), FGFR-1 
(8pl2), and FGFR-2 (10q24) oncogenes. All of these are examples of genes for 
which alterations in gene copy number could serve as an indicator or disease onset 
20 or progression. 

This example uses tissue microarrays, of the type shown in PCT publications 
WO US99/04000 and 04001 (which are fully incorporated by reference) as a high 
throughput technique for efficiently performing FISH on hundreds of tissue sample 
specimens. An example of such a tissue microarray is also shown in FIG. 12. In the 
25 absence of an automated technique for counting probe signals in the tissue 

microarray specimens, it would take many hours for each of the tissue specimens in 
the array to be examined and scored. However, using the automated technique 
described herein, hundreds or even thousands of the tissue specimens in the array 
can be examined and scored in much less time. 



16 




Prostate tumor microarray 

Formalin-fixed and paraffin-embedded tumor and control specimens were 
obtained from the archives of the Institutes for Pathology, University of Basel 
5 (Switzerland) and the Tampere University Hospital (Finland). The least 

differentiated tumor area was selected to be sampled for the tissue microarray. The 
specimens that were interpretable for at least one FISH probe included the 
following: I) transurethral resections from 32 patients with benign prostatic 
hyperplasia (BPH) to be used as controls; II) 223 primary tumors, including 64 

10 cancers incidentally detected in transurethral resections for BPH; 1 45 clinically 

localized cancers from radical prostatectomies, and 14 transurethral resections from 
patients with primary, locally extensive disease; III) 54 local recurrences after 
hormonal therapy failure, including 3 1 transurethral resections from living patients 
and 23 specimens obtained from autopsies; IV) Sixty-two metastases collected at the 

15 autopsies from 47 patients who had undergone androgen deprivation by 

orchiectomy, and had subsequently died of end-stage metastatic prostate cancer. 
Metastatic tissue was sampled from pelvic lymph nodes (8), lung (21), liver (16), 
pleura (5), adrenal gland (5), kidney (2), mediastinal lymph nodes (1), peritoneum 
(1), stomach (1), and ureter (1). 



20 



Construction and sectioning of tissue microarrays 

The prostate tissue microarray was constructed using a tissue arraying 
instrument that created holes in a recipient paraffin block, and acquired tissue cores 
from the donor block by a thin-walled needle with an inner diameter of 0.6mm, held 
25 in a X-Y precision guide. The cylindrical sample was retrieved from the selected 
region in the donor and extruded directly into the recipient block with defined array 
coordinates. A solid steel wire closely fit in the tube was used to transfer the tissue 
cores into the recipient block. After the construction of the array block, multiple 
5pm sections were cut with a microtome using an adhesive-coated tape sectioning 
system (Instrumedics, Hackensack, New Jersey). H&E stained sections were used 
for histologic verification of tumor tissue on the arrayed samples. 



30 



17 




Fluorescence in-situ hybridization (FISH) 

Two-color FISH to sections of the arrayed formalin-fixed samples was 
performed using a Spectrum Orange-labeled androgen receptor (AR) probe with 
corresponding FITC-labeled centromeric probes (Vysis, Downer's Grove, Illinois). 
5 The hybridization was performed according to the manufacturer's instructions. The 
following tissue treatment protocol was developed to allow formalin-fixed tumors 
on the array to be reliably analyzed by FISH: the slides of the prostate tissue 
microarray were first deparaffinized, acetylated in 0.2 N HC1, incubated in 1 M 
sodium thiocyanate solution at 80 degrees C for 30 minutes and immersed in a 
10 protease solution (0.5mg/ml in 0.9% NaCl) (Vysis, Downer's Grove, Illinois) for 10 
minutes at 37 degrees C. The slides were then post-fixed in 10% buffered formalin 
for 10 minutes, air dried, denaturated for 5 minutes at 73 degrees C in 70% 
formamide/2x SSC (SSC is 0.3M sodium chloride and 0.03M sodium citrate) 
solution and dehydrated in 70, 80, and 100% ethanol, followed by proteinase K 
1 5 (4ug/ml phosphate buffered saline) (GIBCOBRL, Life Technologies Inc., 

Rockville, Maryland) treatment for 7 minutes at 37 degrees C. The slides were then 
dehydrated and hybridized. The hybridization mixture contained 3ul of each of the 
probes and Cotl-DNA (lmg/ml; GIBCOBRL, LifeTechnologies Inc., Rockville, 
Maryland) in a hybridization mixture. After overnight hybridization at 37 degrees C 
20 in a humid chamber, slides were washed and counterstained with 0.2uM DAPI. 
FISH signals were scored with a Zeiss fluorescence microscope equipped with a 
double-band pass filter using x40-xl00 objectives. The relative number of gene 
signals in relation to the centromeric signals was evaluated for a region of interest. 
Criteria for gene amplification were as follows: at least 3 times more test probe 
25 signals than centromeric signals per cell in at least 1 0% of the tumor cells. 

Test/control signal ratios in the range between 1 and 3 were regarded as low level 
gains, and were not scored as evidence of specific gene amplification. 

Output from the two color AR FISH is shown in FIG. 5B, which shows a series 
of optical sections 1-5, and a max image M that combines the fluorescent signal 
30 segments of the optical sections 1 -5. A "signal segment" in an optical section, in 

combination with signal segments from other optical sections, are selectively chosen 
to make up a complete probe signal seen in the Max view. The red signal segments 



18 




in the optical sections (and the red signals in the Max view) have weaker signals 
than the green signals. Any of the two-dimensional optical sections 1-5, or the max 
image M, can be analyzed to determine the ratio of red to green signal segments or 
signals. As shown in FIG. 5B, the ratio of red to green -(seen as a ratio of weaker to 
5 stronger spots) is substantially 1 , throughout the optical sections 1 -5 and in the max 
image M. This indicates that there is not an amplification of the AR, at least in the 
tissue section illustrated in FIG. 5B. In a gray scale version of FIG. 5B, the green 
signals appear as brighter spots, and the red signals are more difficult to detect. 
Alternatively, a three-dimensional representation of the max image can be 

10 obtained by analyzing the optical sections that contain each colored signal segment, 
and separating spot signals that overlap in an axial (vertical) or transverse 
(horizontal) direction. The separated signals can then be counted, and a ratio of red 
to green signals more accurately determined. Examples of FISH output from the 
green labeled probe, obtained by the three-dimensional analysis system of the 

15 present invention, are shown in FIG. 6 A, which illustrates FISH-signal counting 

from a series of confocal images. Cuts 1 through 8 are optical sections obtained by 
the confocal microscope filtered to obtain only the green signal segments at different 
levels, while the Max-image is a combination of the cuts 1-8 (and is similar to the 
superimposed images that would be obtained in conventional two-dimensional 

20 FISH). The total number of separate green signal segments counted in cuts 1-8 was 
393 (summing the number of signal segments shown in brackets above each section 
number 1-8). After automated analysis of the signal segments, with separation of 
vertically and horizontally overlapping signals, it was determined that there were 
283 unique signals. In a gray scale version of FIG. 6A, the green signals appear as 

25 slightly darker spots, such as the one jutting atop cut #6. 

EXAMPLE 3 
S6K Amplification in Breast Cancer 

In this example, the biological consequences were examined of genomic 
30 rearrangements at 17q23, a locus amplified in up to 20% of primary breast cancers 
as assessed by comparative genomic hybridization (CGH). An array of primary 



19 



breast cancers was constructed, and used to determine S6K gene amplification 
frequencies in vivo. 

Two cohorts of primary breast cancers were studied using the tissue microarray 
analyses. The first microarray consisted of 372 ethanoKfixed primary breast 
cancers. The second microarray consisted of 612 primary breast cancers from the 
years 1985-1995, from patients with complete clinico-pathological information 
including an average of 5.4 years of follow-up. Both series of tumors were 
analyzed, with 668 cases being informative for all experimental and clinical data 
Both tumor cohorts were obtained from the Institute of Pathology, University of 
Basel. The tumor samples included 73.3% ductal, 13.6% lobular, 3% medullary 
2.60/c mucinous, 1 .5% cribriform, 1 .4% tubular, 1 • 1 % papillary carcinomas 1 9% 
ductal carcinoma in situ, and 1.7% of other rare histological subtypes. The grade 
distnbution was 24% grade 1, 40% grade 2, and 36% grade 3. The P T stage was P T1 

in 32%, P T2 in 5 1 %, pT3 in 7%, and pT4 in 1 0%. 

A SpectrumOrange labeled PAC probe specific for S6K and a SpectrumGreen 

labeled chromosome 17 centromere probe (Vysis, Downers Grove, IL) was used for 
..copy number analysis. Interphase FISH to breast cancer cell lines was done as 

previously described in Barlund et al., Genes Chrom. Cancer 20:372-376 1997 

The hybridizations are evaluated using a Zeiss confocal fluorescence microscope, 

and following the algorithm shown in FIG. 5A. 

The breast cancer tissue section showed a high-level SK6 gene amplification 

with a higher number of red signals than green reference signals, as shown in FIG 

6B. In a gray scale version of 6B, the red signals appear as brighter dots, such as the 

loose cluster of dots in the center of image 5. 



EXAMPLE 4 

Graphical User Interface for Three-Dimensional FISH Signal Counting 

Although the invention can be implemented in a variety of computing 
environments, the following examples are implemented in MATLAB, which is 
available from Mathworks of Natick, Massachusetts. FIG. 7 shows a graphical user 
interface (GUI) that is displayed by the system for processing the fluorescent images 
and countmg signals associated with probes that have hybridized to a target nucleic 



20 



acid sequence. The image- displayed on the interface is an image of one of the 
optical sections of a tissue section that has undergone FISH (in this case sliced 
refers to the 5 th of 8 contiguous optical sections at successively deeper depths of the 
tissue section; stack=7 refers to the number of the tissue section on the tissue array 
that is being processed; and threshold=26 refers to a threshold value of signal 
intensity below which intensity values are eliminated from calculations). 

FIG. 8 is a histogram which illustrates how the threshold value (threshold=26) 
is determined in this example. The histogram plots the relative frequency of gray 
levels (which corresponds to signal intensity) across the pixels of the image of each 
optical section, where the x-axis is the frequency of occurrence, and the y-axis is a 
brightness value. This graph shows two modes, in which the first mode (the sharp 
spike) represents noise (primarily dark background), and the second mode (the 
broader peak) represents useful signals (such as brighter pixels associated with 
fluorescent intensity). To determine a saddle point (threshold value), the histogram 
is smoothed, and the derivative of the resulting graph is calculated. The point T=26 
is the point at which the derivative of the curve changes value from + to and this is 
the value that is selected as the threshold. Pixels having a brightness below this 
level are eliminated from further data manipulation. 

FIG. 9A shows an image which is displayed of contiguous signal segments in 
different optical sections 1 -8 of a tissue stack (which corresponds to a microarray 
spot). This FISH signal is designated spot 224, and the image intensity of the signal 
segments that are present on optical sections 1 -8 can be seen in the small image 
panels across the bottom of the display. Each small panel 1-8 is a section of the 
three-dimensional space within the tissue section, and the number of pixels where 
signal is present is shown by the white boxes in the display panels. Although 
several three-dimensional representations of the signal segments are shown in FIG. 
9, the center representation (with segments labeled la-8a) will be used for purposes 
of illustration to explain how the different signal segments are combined into a 
single spot signal, or how certain signal segments are discarded. 

The relatively small brightness signals (which correlate with respectively few 
white pixels) in panels 1, 2, 3 and 4 are mapped into correspondingly small three- 
dimensional geometric boxes la, 2a, 3a and 4a, having a volume proportional to the 



21 




area of the illuminated pixels in the panels 1-4. The absence of any illuminated 
pixels in panel 5 corresponds to an absence of any geometric figure in space 5a (or 
can indicate that any pixels were below a threshold value and therefore not 
considered for further data manipulation). The area of the white pixels in panel 6 
5 correspond to the volume 6a, and the area of the white pixels in panel 8 corresponds 
to the volume 8a, while the very large illuminated area in panel 7 corresponds to the 
figure 7a, which has the largest volume of any of the geometric figures. 

One function of the algorithjri is to eliminate background noise from actual 
probe signals. This can be done by setting a threshold value, below which a signal 

10 segment can be considered not to be a valid signal segment. The signal segments in 
panels 1, 2, 3, 4 and 6 do not meet the threshold criteria for minimum illuminated 
pixels, and are therefore not considered when combining the signal segments into an 
actual signal. The signal segments in panels 7 and 8 do, however, meet the criteria 
for minimum pixel illumination. Since more pixels are illuminated in panel 7 than 

15 in panel 8, the location of the signal is assigned to level 7, although alternatively it 
could be considered to reside in both levels 7 and 8. 

Other signal overlap situations that the algorithm can resolve are also shown 
schematically in FIGS. 9B and 9C. In FIG. 9B a side view of optical sections 1-10 
is shown, in which two spot signals vertically (axially with respect to the microscope 

20 lens) overlap one another. The signal segments in sections 1,5,6 and 10 fall below 
a pre-selected threshold value, and are eliminated from further consideration. The 
remaining signal segments in sections 2, 3, 7, 8 and 9 are above the threshold, and 
are grouped into a first spot signal at contiguous sections 2-4, and a second spot 
signal at contiguous sections 7-9. The probable location of the top spot signal can 

25 be assigned to section 3, where the strongest signal segment of the top spot signal is 
found, or alternatively the spot signal can be considered to reside at sections 2-4. 
The probable location of the bottom spot signal can be assigned to section 8, where 
the strongest signal segment of the bottom spot signal is found, or in sections 7-9 
across which the signal segments extend. 

30 FIG. 9C schematically illustrates how vertically and/or horizontally 

(transversely with respect to the optical axis of the microscope lens) overlapping 
spot signals can be resolved. A side view of optical sections 1-8 is shown in which 



22 



the signal segments in sections 1 and 8 do not meet a threshold, and are eliminated. 
In section 4, there are two horizontally non-contiguous signal segments. The signal 
segment 4B (represented by a gray bar) does not meet a threshold value and is 
eliminated, while the segment 4A (represented by the black bar) does satisfy the 
threshold and remains for consideration. Similarly, there are two non-contiguous 
signal segments in section 5. The segment 5B (represented by the gray bar) meets 
the threshold, while the segment 5A (represented by the black bar) does not meet the 
threshold and is eliminated. Once the signal segments 4B and 5A are eliminated, the 
segments can be resolved into a top spot signal (comprising segments 2, 3 and 4A) 
and a bottom spot signal (comprising segments 5B, 6 and 7) 

FIG. 10 illustrates a three-dimensional representation of a tissue section that has 
been subjected to FISH and analyzed by the method of the present invention. The 
three-dimensional space is divided into x-y-z coordinates, in which the z axis is 
associated with successive optical sections 1-8 of the tissue section, and the signal 
segments of all of the signals in the section are illustrated. Signal segments are 
illustrated as small black cylinders, and the assigned location of each spot is 
illustrated as a gray colored larger cylinder, some of which are designated Al, A2, 
A3, A4, A5, A6, A7 and A8. Each of these gray cylinders A is associated with a 
corresponding sphere B in the max plane, in which the volume of each sphere B can 
be proportional to the volume of the A cylinder with which it is associated. 

The system additionally provides an opportunity for a user to provide 
guidance during spot counting. For example, the user can specify a particular area 
of interest by selecting it on the screen. Typically, the max image (e.g., image M of 
FIG. 5B) is presented, and the user may select an area or areas via a pointing device 
(e.g., a mouse). Counting is then limited to only the selected area or areas. Such a 
feature can be particularly useful when the user recognizes that a certain area of the 
image relates to a region of interest. 

The system also provides a way to eliminate a specified area or areas 
selected via a pointing device (e.g., a mouse). Portions of the image within the 
specified area or areas (sometimes called "gated areas") is ignored when spots are 
counted. 



23 




EXAMPLE 5 
Algorithm Overview 

The method of the present invention facilitates the analysis and quantitation 
of FISH on tissue microarrays, by overcoming previous obstacles to automation. 
This method, as implemented in the algorithm of this example, applies dual-color 
FISH with different gene probes and corresponding chromosomal reference probes 
to tissue microarrays. Because each tissue section is a three-dimensional volume, a 
confocal microscope system is used to generate a three-dimensional stack of a 
number of fluorescent images (e.g., 24 images) over defined tumor areas of each of 
the 4 fim thick tissue specimens on the tissue microarrays. The algorithm identifies 
signal segments on each level along the Z-axis, and differentiates signal segments 
which overlap vertically above the X-Y plane. 

The algorithm can be refined to account for spot clusters, and a variety of 
filtering mechanisms can be used to remove signals related to, for example, auto- 
fluorescence and other false signals. A variety of parameters used during the 
algorithm can be modified by the user to account for circumstances related to a 
particular image or set of images. For example, since the magnification used for an 
image can vary, the parameters of the algorithm can be modified to account for the 
relative size of a pixel. There are many other examples of parameters (e.g., numeric 
values) that can be varied to avoid a strict adherence to a particular value. Various 
results of intermediary calculations are stored to facilitate efficient re-calculation 
when parameters are modified by the user. 

An overview of an implementation of the algorithm is shown in FIG. 13. 
Typically, the algorithm is provided with a set of digital image slices (e.g., a raster 
image) representing a set of observations (e.g., of a biological specimen subjected to 
FISH) taken at different depths along a z-axis via a confocal microscope. Thus, the 
set of images is sometimes called a "stack." The stack typically consists of 
anywhere from 8-32 images, although more images may be used. Satisfactory 
results have been obtained with 24 images. 

At 1302, possible fluorescent image components are identified in the slices. 
For example, pixels meeting certain criteria are selected as possibly corresponding 
to a location within or near a spot and placed into a set of binarized images. The 



24 




i ir in! ii :i -t: ijiH' ^ ui mi* n n n ,: „ ii ih^ 




criteria can include an intensity level, and the images may be subjected to various 

filters before constructing the binary image. 

At 1304, the resulting pixels in each slice are projected onto a projection 

image. For example, a binary projection image can be constructed from a stack of 
5 binary images by setting pixels in the binary projection image at locations (X,Y) 

whenever there is any pixel set in any of the stack of binary images at the same 

(X,Y) location (regardless of the Z location). 

At 1306, insignificant regions in the image slices are discarded. So, for 

example, single lone pixels in a slice, or a contiguous set of 2-3 pixels are ignored, 
10 removed, or labeled as insignificant. A similar process can be applied to the 

projection image. 

At 1308, each contiguous region in the projection image is considered in 
turn. For example, minimal rectangles can be drawn around each substantially 
contiguous region in the projection image, numbered, and considered in turn. 
15 At 13 10, contiguous regions in the image slices associated with the 

projection image region under consideration are analyzed and grouped into spot 
^ _ candidates. At 1320, the next region in the projection image is considered, until 
each has been considered. 



20 example, certain candidate spots may be false spots, or others may actually be 
clusters representing plural spots. 

Finally, at 1324, the spots are counted. The spot count can then be used vis- 
a-vis other spot counts to provide a ratio (e.g., a ratio of genes to centromeres). 



In this method, the traditional signals-per-nucleus approach is replaced by an 
overall gene-to-chromosome ratio in a defined tumor area. Therefore, the ratio 
between the signals of gene probes and the reference probes are calculated 
30 irrespective of the number of nuclei. This new method for automated capturing and 
quantitation of FISH signals significantly advances the utility of tissue microarrays 



At 1322, filtering an augmentation can be applied to the candidate spots. For 



25 



EXAMPLE 6 



Details of Algorithm 



25 




0 




as a high-throughput tool for detection of relative gene copy number alterations 
during cancer development and progression. 

Although the invention can be implemented in a variety of computing 



environments, the following examples are implemented in MATLAB, which is 
5 available from Mathworks of Natick, Massachusetts. Various MATLAB-based 
Graphical User Interface (GUI) tools allows visualization of three-dimensional 
shapes of spots as well as their two-dimensional projections on the slices. The 
dynamic interface provides for manipulation with many input parameters for the 
threshold, number of slices, and filtering. It also provides for visualization of 
10 differently colored spots, storage, displaying and printing data in the form of 
different 3-D images, diagrams, and tables. 

An implementation of the algorithm in a scenario with 24 image slices can 
have the following features: 



15 yield 24 outputs, each possessing brightness intensity spikes jutting above an 
essentially flat background. Each bright spike can correspond either to a signal 
segment, or to noise. 

(2) Each top-hat image is thresholded to produce a stack of 24 binary images 
showing spike locations. Morphological filters are applied to the binary images to 

20 eliminate noise, and touching spots are segmented. 

(3) Binary spot markers occurring as vertical neighbors in the stack are 
grouped into one final spot located at a particular assigned stack level. Various 
parameters of the algorithm are set to fit the physical characteristics of the images. 
These include window size for the top-hat transform, threshold levels, and sizes of 

25 filter structuring elements. 

The performance of the algorithm can be based on the following main steps: 
1. Given L + 1 slices of the fluorescent images, X 0 , X u X L , of sizes NxM, 
calculate the m ax-image Y max : 



(1) A morphological top-hat transform is applied to each of the 24 images to 



30 



^axO",0 = max{jr A o\0;* = 0,1,...,/,}. 



26 




i o o asts »s 91 ■■: o «a i. rA, ini: p> 



The histogram, H(t), of this image is calculated and smoothed, and then the 
approximate value of the threshold, T, is determined as the minimal saddle-point 
between two modes of the histogram. The histogram is displayed on the screen and 
the threshold can be changed on-line. The chosen value- of the threshold is used in 
the next steps for computing binary images. 



10 



2. For each image k=0, /,..., L 9 the top-hat transform is calculated. 
Sometimes the top-hat transform is called "image minus opening." The calculation 
is 



X k oB, 



(1) 



where X k o B is opening by the structuring element B , which is taken to be a circle 
fitting within a 7x7 square (roughly, a circle with a diameter of 7). Other structuring 
15 elements can be used (e.g., the following examples show a 5x5 square) and can be 
specified by the user via a user interface feature. The opening is defined by 
X o B = (X0 B) © B , and calculated as follows: 



(X0B)(j,i) = mm{XU + j\ J + /, e B}, 
20 {X © B)(j, i) = max{X(j + j\ , / + /, ); {j, i) € B) . 



(2) 
(3) 



25 



To calculate the top-hat transform in (1), the decomposition of the transform 
by two top-hat transforms with 1x5 and 5x1 structuring elements is used, which 



results in the fast pe; 



X0 



"1 1 


1 1 1" 




f 


T 


N 


1 1 


1 1 1 






l 




1 1 


1 1 1 




X0 


1 




1 1 


1 1 1 






1 




1 1 


1 1 1 






l 


/ 



formance of the transform 5x5: 



0[l 1 1 1 l], 



(4) 



27 




Y® 







f 


T 


\ 








l 








Y® 


l 










l 










1 


/ 



>[l 1 1 1 l] 



(5) 



The data of the top-hat transform Y k is stored for future processing in case the value 
of the threshold needs to be changed. 

5 

3. Each top-hat transform Y k is thresholded by the value T, resulting in a 
binary image B k , k=0,l,...,L, 

B k (j,i) = l if Y k (j\0>T, 
10 5,(7,0 = 0, if Y k {jJ)<T. 



A projection of multiple slices of the image can be calculated as 



* max 0\0 = l, ^ B k (j\i) = l, 

15 

for at least one k. Typically, the projection is calculated with reference to all 
available image slices, in which case the binary image B max is sometimes called the 
"projection binary image" because it is roughly equivalent to peering down through 
the stack of binary images. In some sense, the binary image B max is also a 
20 "maximum" image; however, in the example, the maximum is computed comparing 
only 1 's and 0's. 



4. The obtained binary images B k , are processed to find the separate regions 
revealed by the thresholded top-hat transform. Each actual spot is composed of 
some number of these regions on different binary slices. Each of these slice regions 
is referred to as a signal segment, so that each spot signal is composed of a union of 
signal segments. The software can identify signal segments by locating substantially 



28 



contiguous groups of pixels, number the signal segments, and calculate the 
following data: 

A. coordinates of the signal segment center; 

B. the number of pixels composing the signal segment; 

C. the coordinates of the minimal rectangle which contains the signal segment. 
Signal segments composed of a single pixel are ignored during this procedure but 
their presence can be used during other steps. 

5. The projection binary image B max is processed similarly, and its separate 
regions (projection binary image regions) are identified by locating substantially 
contiguous groups of pixels and numbered. The following are calculated for spot 
signals on the projection binary image: 

A. coordinates of the projection binary image region center; 

B. the number of pixels (l's) composing the region; 

C. the coordinates of the minimal rectangle which contains the region; 

D. the number of signal segments from the different optical sections, which lie 
inside the surrounding rectangle of the region; and the largest such signal 
segment is marked. 

The algorithm then attempts to identify each of the spots by integrating the 
signal segments into spot signals via analysis of the sizes, intensities, and locations 
(in 3 dimensions) of the signal segments. Typically, for purposes of determining 
whether a particular signal segment is of sufficient size to indicate a contiguous 
connection, a threshold (e.g., 3 pixels) is useful to designate as a cutoff. So, for 
example, signal segments having 3 or fewer pixels can be ignored when determining 
whether the slice above and/or below a signal segment is contiguous with the 
particular signal segment. 

Specifying such a threshold can be particularly useful in resolving closely- 
neighboring spots, such as two spots having substantially similar (X,Y) coordinates 
but appearing in different slices of the image. Examples of such scenarios are 
shown in FIGS. 9B and 9C. As shown, there can be more than one spot in a vertical 
direction or in a horizontal direction. 

29 




During processing, segments that are substantially vertically contiguous can 
be associated with a particular optical slice having the greatest signal intensity, 
which is often the center of a single spot. However, it may be that there is another 
image slice with great signal intensity in the same segment, in which case there may 
5 be more than one spot in the segment. 

Typically, the information about a spot candidate includes a size (e.g., total 
number of pixels across the image slices) and an intensity (e.g., the brightest pixel in 
the region of the slice having the most pixels for the spot candidate). 

A data flow diagram for an implementation of the algorithm is shown in FIG. 
10 14. The slice data 1402 is used to construct a max image 1404 and the top-hat slice 
data 1406. The max image 1404 is useful for presenting to the user to give the user 
an informative depiction of a combination of the image slices and for various user 
interface features. It can be constructed by taking the pixel with the greatest 
intensity out of the image slices in the z-plane. 
15 The top-hat slice data 1406 can be generated as described above. Max top- 

hat image 1408 can be generated by finding the greatest intensity of the top-hat 
^, slices for each (X,Y) location. The max top-hat image 1408 need not be displayed, 
but is useful, in combination with the top-hat slice data 1406 to construct the 
histogram 1410. 

20 A threshold 1412 is chosen with reference to the histogram 1410 as 

described above. Then, the thresholded slice data 1420 can be constructed by 
generating a binary image for each of the slices, where pixels in the slices are set if 
the slice image in the slice data 1402 exceeds the threshold 1412. 

From the thresholded slice data 1420, substantially-contiguous regions of the 
25 thresholded slice data can be identified to generate regions 1422. The regions are 
filtered to produce filtered regions 1424 of the thresholded data. 

Also from the thresholded slice data 1420, a projected binary image 1426 can 
be constructed, and substantially-contiguous regions 1430 within the projected 
binary image 1426 found. These regions can be filtered to generate filtered regions 
30 1 440 of the projected image. 

Z-axis analysis data 1428 can be generated with reference to the filtered 
regions 1424 and the regions 1430 and used to generate the candidate spots 1432, 



30 



from which, false spots 1434 and clusters 1442 can be identified. Additional 
filtering can be performed to generate spot count 1444. Finally, display data 1450 
can be used to present graphical results to the user. 

The data flows in the diagram are general only, -and often data in upstream 

5 data components can be relied upon by downstream data components (or vice versa), 
even if not specifically shown in FIG. 14. As the data components are generated, 
they are often saved to permit efficient re-calculation when a parameter is adjusted 
to account for a particular image scenario. 

In addition to the above-described filters, additional filtering mechanisms 

10 can be provided to improve accuracy of spot counting. For example, one feature 
relates to removing false spots relating to, for example, autofluorescence. Two 
separate approaches relate to removing based on intensity and intensity in 
combination with area. For example, after candidate spots are identified, an 
intensity of the spot can be calculated. If the intensity exceeds a certain threshold 

15 (say, two standard deviations above the mean intensity), the spot can be discarded 
and not counted. In addition, or alternatively, the intensity can be combined with 
(e.g., multiplied by) the area or volume of a spot. Again, if the value is over a 
particular threshold, the spot can be discarded and not counted. The approaches can 
be combined. 

20 For example, the image in FIG. 1 5 A illustrates spot candidates in an image 

of normal prostate tissue in which the X centromere has been labeled via FISH. 
Some of the spot candidates are actually autofluorescent tissue, and can be 
eliminated from the spot count. The image can be presented as part of the user 
interface. 

25 The graph in FIG. 15B shows the intensity of the 149 spot candidates 

identified in the image. A mean and standard deviation of the spot candidate's 
intensities is calculated, and used as a threshold (e.g., two standard deviations above 
the mean). Spot candidates having intensities above the threshold are discarded 
(e.g., identified as "small autofluorescent tissue particles"). 

30 The graph in FIG. 15C is similar to that of 15B, but also includes an area 

component (e.g., by multiplying the intensity by the number of pixels" and not 
included in the spot count). Again, spot candidates having a rating above the 



31 



1LOI OiS.SP! 69 « ..£19 JLfiuOSl - — 



threshold are discarded (e.g., identified as "large autofluorescent tissue particles" 
and not included in the spot count). Eight candidates can be removed because they 
exceed the threshold. The threshold in either case can be adjusted manually by a 
user if the default setting is inappropriate. 
5 FIG. 1 5D shows the image of 1 5 A, but the small autofluorescent tissue 

particles, large autofluorescent tissue particles, and the true FISH signals are 
differentiated by presenting each as a different color. The image can be presented as 
part of a user interface for the system. Eliminating the autofluorescent particles 
typically leads to more accurate spot count results. In a color version of the image, 

10 the image components 1510, among others, are portrayed in the color green to 

indicate that they are considered true FISH signals. The image components 1520 
and 1521 , among others, are portrayed in the color yellow to indicate that they are 
small autofluorescent tissue particles. The image components 1530-1533, among 
others, are portrayed in the color blue to indicate that they are large autofluorescent 

15 tissue particles. A variety of other colors or other ways of emphasizing and 
differentiating the image components related to autofluorescence can be used. 

Still another feature than can be implemented is removal of spots appearing 
at substantially identical positions in two channels (e.g., red and green channels 
generated in two separate images, such as an image for a test probe and an image for 

20 a reference probe). Typically, such a situation means the spot can be ignored (e.g., 
not counted for either image). 

Yet another feature relates to identifying clusters and estimating the number 
of spots in a cluster. In some cases, certain criteria (e.g., the size of a projection 
binary image region or the total number of pixels in slice regions related to a 

25 projection binary image region) indicates that a cluster of spots is present. And, the 
software can be configured to accept user guidance as to which portions of the 
image are a cluster (e.g., via selection by pointing device). Determining the number 
of spots in a cluster can be difficult; however, one way of providing accurate 
estimates is to provide a calibration mechanism. 

30 For example, in one instance, 200 sets of clusters were analyzed manually to 

determine how many spots were in the clusters. These clusters were then subjected 
to the above-described algorithmic analysis. Under certain circumstances, a factor 

32 



of 2.5 was determined to be appropriate for determining how many spots were in a 
cluster. Thus, for example, if an area is identified as a cluster and the standard 
algorithmic analysis shows 4 spots, an estimate of 10 is provided. Other, more 
complex, analysis can be done, such as, for example, counting the total number of 
pixels associated with the signal segments related to a projection binary image 
region. 

For clusters, a mapping between spots detected and actual spots can be used. 
Sometimes detection of a small number of spots (e.g., 3) may in fact indicate a tight 
cluster of (e.g., 9) spots, while detection of a larger number of spots (e.g. 7) may be 
a loose cluster of (e.g., 9) spots. Therefore, a uniform gain factor is not appropriate. 
The appropriate method for handling clusters may vary depending on the target 
being counted. For example, certain genes have a higher propensity for clustering 
than do centromeres. The user can manually adjust the cluster detection and 
counting parameters manually. 

The number of spot signals is counted and the results are provided to the user. 
The interactive options of the program allow the threshold T to be changed, and 
process the spot counting of the fluorescent images from step 3, avoiding the 
repeated calculation of the top-hat transforms. 

The output data using this algorithm can include: 

1 . Original images of all slices. 

2. Max-image. 

3 . Histogram of Max-image . 

4. Max-image with colored spots. 

5. Top-Hat transforms of the images. 

6. Number of spots. 

7. Max-image with colored spots after filtering out spots with less than some 
specified number of pixels. 

8. Original images with colored spot-cuts. 

9. Images with colored rectangles surrounding the spots. 

10. 3-D view of each spot with sections lying on slices by vertical. 

1 1 . 2-D view of spot composition by projections. 



33 




12. 3-D view of a number of spots with their projections by slices. 

13. Table of data for spot-cuts for L slices, which consists of: 



# spot-cut 


# slice 


^center 


ycenler 


# pixel 


X/e/l 


' X right 


ybotlom 




r.v.l. 


24 


4 


160 


124 


24 


160 


164 


120 


125 


1 



where r.v.l. is the abbreviation of relative vertical location, jc /e/ „ x righh y boUO m , and y t 
are the coordinates of the rectangles surrounding the spot-cuts. The last box 
indicates whether the spot-cut is below, on, or above the true spot. 

14. Table of data for spots on the Max-image, which consists of: 



# spot-cut 


^center 


ycenler 


# pixel 


Xle/I 


bright 


ybotlom 


yiop 


224 


182 


164 


42 


180 


186 


160 


167 



10 EXAMPLE 7 

Scatter Plot of Spot Candidates and Calibration Features 

An additional user interface can be presented for assisting in analyzing and 
manipulating the data related to FISH experiments. For example, FIGS. 16A and 
16B show scatter plots generated for a FISH experiment in which the presence of 

15 centromere 17 and gene HER-2 were detected to generate a ratio indicating possible 
amplification of HER-2. 

Spots relating to centromere 17 typically do not cluster and are shown in 
FIG. 16 A. Spot candidates are represented by appearing in a position appropriate 
for their intensity and size (e.g., number of pixels). The user can set a threshold 

20 parameter (e.g., 4) so that candidate spots having fewer than the specified number of 
pixels are excluded from the spot count. Also, a minimum intensity parameter can 
be set. 

Spots relating to HER-2 are shown in FIG. 1 6B and are more difficult to 
count due to potential clustering. For purposes of illustration, areas of the plot 
25 roughly relating to background noise 1652, real signals 1654, and clusters 1656 are 
circled. To facilitate determining an appropriate threshold, the user is presented 
with a calibration feature. An image (e.g., the maximum image 1404) is presented 
to the user, who can then verify that certain spot candidates are indeed spots, 

34 




although there are of minimal intensity. The user selects the spot candidate shown 
and activates a user interface element indicating the spot candidate is of minimal 
intensity. 

Subsequently, when the scatter plot 1650 of FIG. 16B is shown, the spot 
candidates of minimal intensity are shown in red to facilitate choosing an 
appropriate minimum intensity threshold (e.g., by adjusting an appropriate 
parameter in the algorithm), below which spot candidates are not counted as spots. 
Thus some of the points 1658, typically near the bottom of the plot, are red. The 
minimum intensity threshold is typically adjusted to barely include the minimum 
intensity candidates. 

Also useful is the illustration of clusters 1656, which can be used to adjust 
the cluster selection criteria. Additionally, the user can at any time click on a 
particular spot candidate (e.g., 1660) to navigate to information about the spot 
candidate, including a three-dimensional view of the spot candidate. At the user 
interface, the user can then manually designate the spot candidate as a cluster or 
non-cluster. 

The scatter plot 1650 thus presents a useful tool for allowing the user to more 
easily grasp the totality of the image data and easily navigate to appropriate user 
interface screens for analyzing and adjusting the data, as well as calibrating the 
system. 

EXAMPLE 8 
User Direction of Areas to be Analyzed 

The system can also provide a user interface by which the user can guide the 
selection of areas to be analyzed. Although the system typically discards clearly 
aberrant data (e.g., a substantially total black or high-intensity image slice), the 
system may benefit from assistance from the user in determining which areas of an 
image are of interest or which are not. As described below, the user can select an 
area to be counted separately or an area not to be counted. 

FIG. 17 shown a user interface 1702 presented to the user for designating 
areas of interest. Typically, the user interface 1 702 presents a max image (e.g., 
1404). By using a pointing device (e.g., a mouse or trackball), the user can 
designate a particular area 1706 as one to be processed separately. Multiple areas 



35 




can be added to an area set and processed separately. In this way, the user can direct 
the system to focus on particularly relevant portions of the image. 

The system also supports designating areas as not to be processed or 
included in the spot count. The algorithm can store the areas in a list and, for 
5 example, not count spots at a location (e.g., X, Y coordinates) within the area(s) 
designated. 

The above feature is useful because areas of the image can sometimes be 
identified as debris, stroma, connective tissue, or blood vessels, which tend to 
interfere with determining an appropriate spot count in certain scenarios (e.g., some 
10 material may be particularly prone to autofluorescence). 

EXAMPLE 9 

User Interface Including Three-dimensional Representations 
of Spot Candidate 

Still another user interface feature can be presented by the system to assist in 
15 evaluation and manipulation of the image data. FIG. 1 8 shows a user interface 1 800, 
which depicts a spot candidate in three views, 1802, 1804, and 1806. The views 
assist in determining, for example, whether the spot candidate is a cluster, as is 
likely the case in the illustrated example. The view 1 806 is additionally processed to 
give a smoother representation of the spot candidate. The horizontally-contiguous 
20 region with the greatest area (e.g., number of pixels) is specially identified in the 

views with emphasis 1808. A status line 1818 indicates information about the spot 
candidate. 

A strip along the interface indicates the max image, and the 1 5 image slices 
(numbered 0-14) in binary form after thresholding. The image slice having the 
25 greatest area region is noted at 1820. User interface controls 1832 and 1834 allow 
navigation to spot candidates and identified clusters, respectively. 

In view of the many possible embodiments to which the principles of the 
invention may be applied, it should be recognized that the illustrated embodiments 
are examples of the invention, and should not be taken as a limitation on the scope 
30 of the invention. Rather, the scope of the invention is defined by the following 
claims. We therefore claim as our invention all that comes within the scope and 
spirit of these claims. 



36 



